Graphics in R

Basic Plots

Setup

Run the Setup.R file.

If everything works correctly, you should see a plot:

ggplot2 In a Nutshell

  • Package for statistical graphics
  • Developed by Hadley Wickham
  • Designed to adhere to good graphical practices
  • Supports a wide variety plot types
  • Constructs plots using the concept of layers
  • http://had.co.nz/ggplot2/ or Hadley’s book ggplot2: Elegant Graphics for Data Analysis} for reference material

qplot Function

The qplot() function is the basic workhorse of ggplot2

  • Produces all plot types available with ggplot2
  • Allows for plotting options within the function statement
  • Creates an object that can be saved
  • Plot layers can be added to modify plot complexity

qplot Structure

The qplot() function has a basic syntax:

qplot(variables, plot type, dataset, options)

  • variables: list of variables used for the plot
  • plot type: specified with a geom = statement
  • dataset: specified with a data = statement
  • options: there are so, so many options!

Diamonds Data

Objective: Explore the diamonds data set (preloaded along with ggplot2) using qplot for basic plotting.

The data set was scraped from a diamond exchange company data base. It contains the prices and attributes of over 50,000 diamonds.

Examining the Diamonds Data

What does the data look like?

Look at the top few rows of the diamond data frame to find out!

head(diamonds)
## # A tibble: 6 × 10
##   carat       cut color clarity depth table price     x     y     z
##   <dbl>     <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23     Ideal     E     SI2  61.5    55   326  3.95  3.98  2.43
## 2  0.21   Premium     E     SI1  59.8    61   326  3.89  3.84  2.31
## 3  0.23      Good     E     VS1  56.9    65   327  4.05  4.07  2.31
## 4  0.29   Premium     I     VS2  62.4    58   334  4.20  4.23  2.63
## 5  0.31      Good     J     SI2  63.3    58   335  4.34  4.35  2.75
## 6  0.24 Very Good     J    VVS2  62.8    57   336  3.94  3.96  2.48

Basic Scatterplot

Basic scatter plot of diamond price vs. carat weight

qplot(carat, price, geom = "point", data = diamonds)

Another Scatterplot

Scatter plot of diamond price vs carat weight showing versitility of options in qplot

qplot(carat, log(price), geom = "point", data = diamonds, 
      alpha = I(0.2), color = color, 
      main = "Log price by carat weight, grouped by color") + 
  xlab("Carat Weight") + ylab("Log Price")

Your Turn

All of the “Your Turns” for this section will use the tips data set:

tips <- read.csv("https://bit.ly/2gGoiLR")
  1. Use qplot to build a scatterplot of variables tips and total bill
  2. Use options within qplot to color points by smokers
  3. Clean up axis labels and add main plot title

Solutions

  1. Scatterplot of variables tips and total bill
qplot(data = tips, x = total_bill, y = tip)

Solutions

  1. Color points by smokers
qplot(data = tips, x = total_bill, y = tip, color = smoker)

Solutions

  1. Pretty axis lables and title
qplot(data = tips, x = total_bill, y = tip, color = smoker,
      xlab = "Total Bill ($)", ylab = "Tip ($)", 
      main = "Tip left by patrons' total bill and smoking status")

Plotting Map Data

States Data

To make a map, load up the states data and take a look:

states <- map_data("state")
head(states)
##        long      lat group order  region subregion
## 1 -87.46201 30.38968     1     1 alabama      <NA>
## 2 -87.48493 30.37249     1     2 alabama      <NA>
## 3 -87.52503 30.37249     1     3 alabama      <NA>
## 4 -87.53076 30.33239     1     4 alabama      <NA>
## 5 -87.57087 30.32665     1     5 alabama      <NA>
## 6 -87.58806 30.32665     1     6 alabama      <NA>

Basic Map Data

What data is needed in order to plot a basic map?

  • Latitude/longitude points for all map boundaries
  • Which boundary group all lat/long points belong
  • The order to connect points within each group

Basic Map Data

The states data has all necessary information

A Basic Map

A bunch of latitude longitude points…

qplot(long, lat, geom = "point", data = states)

A Bit Better Map

… that are connected with lines in a very specific order.

qplot(long, lat, geom = "path", data = states, group = group) + 
  coord_map()

Polygon vs Path

qplot(long, lat, geom = "polygon", data = states, group = group) + 
  coord_map()

Polygon vs Path

qplot(long, lat, geom = "polygon", 
      fill = I("white"), color = I("black"),
      data = states, group = group) + 
  coord_map()

Incorporating Information

  • Add other geographic information by adding geometric layers to the plot
  • Add non-geopgraphic information by altering the fill color for each state
  • Use geom = "polygon" to treat states as solid shapes
  • Show numeric information with color shade/intensity
  • Show categorical information using color hue

Categorical Data

If a categorical variable is assigned as the fill color then qplot will assign different hues for each category.

Load in a state regions dataset:

statereg <- read.csv("https://bit.ly/2i0AFHK")
head(statereg)
##        State StateGroups
## 1 california        West
## 2     nevada        West
## 3     oregon        West
## 4 washington        West
## 5      idaho        West
## 6    montana        West

Joining Data

join or merge the original states data with new info

The left_join function is used for merging**:

library(dplyr)
states.class.map <- left_join(states, statereg, by = c("region" = "State"))
head(states.class.map)
##        long      lat group order  region subregion StateGroups
## 1 -87.46201 30.38968     1     1 alabama      <NA>       South
## 2 -87.48493 30.37249     1     2 alabama      <NA>       South
## 3 -87.52503 30.37249     1     3 alabama      <NA>       South
## 4 -87.53076 30.33239     1     4 alabama      <NA>       South
## 5 -87.57087 30.32665     1     5 alabama      <NA>       South
## 6 -87.58806 30.32665     1     6 alabama      <NA>       South

** More on this later

Plotting the Result

qplot(long, lat, geom = "polygon", data = states.class.map, 
      group = group, fill = StateGroups, color = I("black")) + 
  coord_map() 

Numerical Data & Maps

  • Behavioral Risk Factor Surveillance System
  • 2008 telephone survey run by the Center for Disease Control (CDC)
  • Ask a variety of questions related to health and wellness
  • Cleaned data with state aggregated values posted on website

BRFSS Data Aggregated by State

states.stats <- read.csv("https://bit.ly/2gT95Hc")

##   state.name   avg.wt avg.qlrest2   avg.ht  avg.bmi avg.drnk
## 1    alabama 180.7247    9.051282 168.0310 29.00222 2.333333
## 2     alaska 189.2756    8.380952 172.0992 28.90572 2.323529
## 3    arizona 169.6867    5.770492 168.2616 27.04900 2.406897
## 4   arkansas 177.3663    8.226619 168.7958 28.02310 2.312500
## 5 california 170.0464    6.847751 168.1314 27.23330 2.170000
## 6   colorado 167.1702    8.134715 169.6110 26.16552 1.970501

Join the data again

states.map <- left_join(states, states.stats, by = c("region" = "state.name"))
head(states.map)
##        long      lat group order  region subregion   avg.wt avg.qlrest2
## 1 -87.46201 30.38968     1     1 alabama      <NA> 180.7247    9.051282
## 2 -87.48493 30.37249     1     2 alabama      <NA> 180.7247    9.051282
## 3 -87.52503 30.37249     1     3 alabama      <NA> 180.7247    9.051282
## 4 -87.53076 30.33239     1     4 alabama      <NA> 180.7247    9.051282
## 5 -87.57087 30.32665     1     5 alabama      <NA> 180.7247    9.051282
## 6 -87.58806 30.32665     1     6 alabama      <NA> 180.7247    9.051282
##    avg.ht  avg.bmi avg.drnk
## 1 168.031 29.00222 2.333333
## 2 168.031 29.00222 2.333333
## 3 168.031 29.00222 2.333333
## 4 168.031 29.00222 2.333333
## 5 168.031 29.00222 2.333333
## 6 168.031 29.00222 2.333333

Shade and Intensity

Average # of days in the last 30 days of insufficient sleep

qplot(long, lat, geom = "polygon", data = states.map, 
      group = group, fill = avg.qlrest2) + coord_map()

BRFSS Data by Gender and State

states.sex.stats <- read.csv("https://srvanderplas.github.io/NPPD-Analytics-Workshop/02.Graphics/data/states.sex.stats.csv")
states.sex.stats <- read.csv("https://bit.ly/2hiKFIb")
head(states.sex.stats)
##   state.name SEX   avg.wt avg.qlrest2   avg.ht  avg.bmi avg.drnk    sex
## 1    alabama   1 198.8936    8.648936 177.5729 28.50714 3.033333   Male
## 2    alabama   2 173.0315    9.224771 163.9956 29.21280 2.041667 Female
## 3     alaska   1 203.3919    7.236111 178.3896 28.91494 2.487179   Male
## 4     alaska   2 169.5660    9.907407 163.1296 28.89286 2.103448 Female
## 5    arizona   1 191.3739    5.163793 177.1724 27.63152 2.814286   Male
## 6    arizona   2 156.2054    6.142857 162.7043 26.67683 2.026667 Female

One More Join

states.sex.map <- left_join(states, states.sex.stats, by = c("region" = "state.name"))
head(states.sex.map)
##        long      lat group order  region subregion SEX   avg.wt
## 1 -87.46201 30.38968     1     1 alabama      <NA>   1 198.8936
## 2 -87.46201 30.38968     1     1 alabama      <NA>   2 173.0315
## 3 -87.48493 30.37249     1     2 alabama      <NA>   1 198.8936
## 4 -87.48493 30.37249     1     2 alabama      <NA>   2 173.0315
## 5 -87.52503 30.37249     1     3 alabama      <NA>   1 198.8936
## 6 -87.52503 30.37249     1     3 alabama      <NA>   2 173.0315
##   avg.qlrest2   avg.ht  avg.bmi avg.drnk    sex
## 1    8.648936 177.5729 28.50714 3.033333   Male
## 2    9.224771 163.9956 29.21280 2.041667 Female
## 3    8.648936 177.5729 28.50714 3.033333   Male
## 4    9.224771 163.9956 29.21280 2.041667 Female
## 5    8.648936 177.5729 28.50714 3.033333   Male
## 6    9.224771 163.9956 29.21280 2.041667 Female

Adding Information

Average # of alcoholic drinks per day by state and gender

qplot(long, lat, geom = "polygon", data = states.sex.map, 
      group = group, fill = avg.drnk) + coord_map() + 
  facet_grid(sex ~ .)

Your Turn

  • Use left_join to combine child healthcare data with maps information.
    You can load in the child healthcare data with:
states.health.stats <- read.csv("https://bit.ly/2hRBMq0")
  • Use qplot to create a map of child healthcare undercoverage rate by state

Solutions

library(maps)
library(dplyr)
states <- map_data("state")
states.health.map <- left_join(states, states.health.stats, 
                               by = c("region" = "state.name"))

# Use qplot to create a map of child healthcare undercoverage 
# rate by state

qplot(data = states.health.map, x = long, y = lat, 
      geom = 'polygon', group = group, 
      fill = no.coverage) + coord_map()

Solutions

Cleaning Up Maps

Use ggplot2 options to clean up the map!

  • Adding Titles + ggtitle(...)
  • Might want a plain white background + theme_bw()
  • Extremely familiar geography may eliminate need for latitude and longitude axes + theme(...)
  • Want to customize color gradient + scale_fill_gradient2(...)
  • Keep aspect ratios correct + coord_map()

Cleaned Up Map

qplot(long, lat, geom = "polygon", data = states.map, 
      group = group, fill = avg.drnk) + 
  coord_map() +  theme_bw() +
  scale_fill_gradient2(
    name = "Avg Drinks",
    limits = c(1.5, 3.5), 
    low = "lightgray", high = "red") + 
  theme(axis.ticks = element_blank(),
        axis.text = element_blank(),
        axis.title = element_blank()) +
  ggtitle("Average Number of Alcoholic Beverages 
          Consumed Per Day by State")

Cleaned Up Map

Your Turn

Use options to polish the look of your map of child healthcare undercoverage rate by state!

Solutions

qplot(data = states.health.map, x = long, y = lat, 
      geom = 'polygon', group = group, fill = no.coverage) + 
  coord_map() + 
  scale_fill_gradient2(
    name = "Child\nHealthcare\nUndercoverage",
    limits = c(0, .2), 
    low = 'white', high = 'red') + 
  ggtitle("Health Insurance in the U.S.\n
          Which states have the highest rates 
          of undercovered children?") +
  theme_minimal() + 
  theme(panel.grid = element_blank(), 
        axis.text = element_blank(),
        axis.title = element_blank())   

Solutions

Plotting Using Layers

Deepwater Horizon Oil Spill

Datasets

NOAA Data:

  • National Oceanic and Admin.
  • Temperature and Salinity in the Gulf of Mexico
  • Measured using Floats, Gliders and Boats

Datasets

US Fisheries and Wildlife Data:

  • Animal Sightings on the Gulf Coast
  • Birds, Turtles and Mammals
  • Status: Oil Covered or Not

Both data sets have geographic coordinates for every observation

Loading NOAA Data

NOAA data is a .rdata file. Read it in:

  1. Download the data here
  2. Run the getwd() command to find your current working directory
  3. Place noaa.rdata in the directory from step 2.
  4. Run the command below:
load("noaa.rdata")

Floats

Take a peek at the top of the floats NOAA data:

head(floats, n = 2)[,1:5]
##   callSign Date_Time JulianDay Time_QC Latitude
## 1 Q4901043 7/12/2010   2455390       1   24.823
## 2 Q4901043 7/12/2010   2455390       1   24.823
head(floats, n = 2)[,6:10]
##   Longitude Position_QC Depth Depth_QC Temperature
## 1   -87.964           1     2        1       29.83
## 2   -87.964           1     4        1       29.65
head(floats, n = 2)[,11:14]
##   Temperature_QC Salinity Salinity_QC  Type
## 1              1    36.59           1 Float
## 2              1    36.58           1 Float

Floats

qplot(Longitude, Latitude, color = callSign, data = floats) + 
  coord_map()

Gliders

qplot(Longitude, Latitude, color = callSign, data = gliders) + 
  coord_map()

Boats

qplot(Longitude, Latitude, color = callSign, data = boats) + 
  coord_map()

Layering

This data has the same context - a common time and common place

  • Want to aggregate information from different sources onto a common plot
  • Start with a common background the lat/long grid
  • Superimpose data onto the grid in layers using ggplot2

Layers Preview

ggplot() +
  geom_path(data = states, aes(x = long, y = lat, group = group)) + 
  geom_point(data = floats, aes(x = Longitude, y = Latitude, color = callSign)) +   
  geom_point(aes(x, y), shape = "x", size = 5, data = rig) + 
  geom_text(aes(x, y), label = "BP Oil Rig", 
            size = 5, data = rig, hjust = -0.1) + 
  xlim(c(-91, -80)) + ylim(c(22,32)) + coord_map()

More Layering

  • Most maps (and many plots) have multiple layers of data.
  • The layers may be from the same or different datasets.
  • ggplot2 makes it easy to add layers to a plot.

What is a Plot?

  • A default dataset
  • A coordinate system
  • layers of geometric objects (geoms)
  • A set of aesthetic mappings (taking information from the data and converting into an attribute of the plot)
  • A scale for each aesthetic
  • A facetting specification (multiple plots based on subsetting the data)

Floats Decomposed

Data: floats, states

Mappings:
aesthetic mapping
x Longitude
y Latitude
color CallSign
Scales:
aesthetic scale
x continuous
y continuous
color discrete

Geoms: Points (floats), lines (states)

Facetting: None

qplot vs ggplot

qplot() stands for “quickplot”:

  • Automatically chooses default settings to make life easier
  • Less control over plot construction

ggplot() stands for “grammar of graphics plot”

  • Contructs the plot using components listed in previous slides

qplot vs ggplot

Two ways to construct the same plot for float locations:

qplot(Longitude, Latitude, color = callSign, data = floats) 

Or:

ggplot(data = floats, 
       aes(x = Longitude, y = Latitude, color = callSign)) +
  geom_point() + 
  scale_x_continuous() + 
  scale_y_continuous() + 
  scale_color_discrete()

Brevity

Even ggplot will automatically pick default scales:

ggplot(data = floats, 
       aes(x = Longitude, y = Latitude, color = callSign)) +
  geom_point()

Your Turn

Find the ggplot() statement that creates this plot:

Hint: look at the Floats data for variable ideas

Solutions

ggplot(aes(x = Depth, y = Temperature, color = callSign), 
       data = floats) + 
  geom_point()

What is a Layer?

A layer added ggplot() can be a geom…

  • The type of geometric object
  • The statistic mapped to that object
  • The data set from which to obtain the statistic

… or a position adjustment to the scales

  • Changing the axes scale
  • Changing the color gradient

Layer Examples

Plot Geom Stat
Scatterplot point identity
Histogram bar bin count
Smoother line + ribbon smoother function
Binned Scatterplot rectange + color 2d bin count

More geoms described at http://docs.ggplot2.org/current/

Piecing Things Together

Build a map using NOAA data

  • Coordinate system (mapping Long-Lat to X-Y)
  • Add layer of state outlines
  • Add layer of points for float locations
  • Add layers for Oil Rig marker and label
  • Adjust the range of x and y scales

The Result

ggplot() +
  geom_path(data = states, aes(x = long, y = lat, group = group)) + 
  geom_point(data = floats, aes(x = Longitude, y = Latitude, color = callSign)) +   
  geom_point(aes(x, y), shape = "x", size = 5, data = rig) + 
  geom_text(aes(x, y), label = "BP Oil Rig", size = 5, data = rig, hjust = -0.1) + 
  xlim(c(-91, -80)) + 
  ylim(c(22, 32)) + coord_map()

Your Turn

animal <- read.csv("https://bit.ly/2hNlTUl")
  1. Read in the animal.csv data:
    (Data of animal sightings around the Deepwater Site)
  2. Plot the location of animal sightings on a map of the region
  3. On this plot, try to color points by class of animal and/or status of animal
  4. Advanced: Is there a way to indicate time?
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
animal$month <- month(as.Date(animal$Date_))

Solutions

  1. Plot the location of animal sightings on a map
ggplot() + 
  geom_path(data = states, aes(x = long, y = lat, group = group)) + 
  geom_point(data = animal, aes(x = Longitude, y = Latitude)) + 
  xlim(c(-91, -80)) + ylim(c(24,32)) + coord_map()

Solutions

  1. Plot the location of animal sightings on a map

Solutions

  1. On this plot, try to color points by class of animal and/or status of animal
ggplot() + 
  geom_path(data = states, aes(x = long, y = lat, group = group)) + 
  geom_point(data = animal, aes(x = Longitude, y = Latitude,    
                                color = class)) + 
  xlim(c(-91, -80)) + ylim(c(24,32)) + coord_map()

Solutions

  1. On this plot, try to color points by class of animal and/or status of animal

Solutions

  1. On this plot, try to color points by class of animal and/or status of animal
ggplot() + 
  geom_path(data = states, aes(x = long, y = lat, group = group)) + 
  geom_point(data = animal, aes(x = Longitude, y = Latitude,    
                                color = Condition)) + 
  xlim(c(-91, -80)) + ylim(c(24,32)) + coord_map()

Solutions

  1. On this plot, try to color points by class of animal and/or status of animal

Solutions

  1. Advanced: Is there a way to indicate time?
ggplot() + 
  geom_path(data = states, aes(x = long, y = lat, group = group)) + 
  geom_point(data = animal, aes(x = Longitude, y = Latitude,    
                                color = Condition), alpha = .5) +
  xlim(c(-91, -80)) + ylim(c(24,32)) +
  facet_wrap(~month) + coord_map()  

Solutions

  1. Advanced: Is there a way to indicate time?

Perception

Motivation

Why are some plots easier to read than others?

Cost of an Education

Good Graphics

Graphics consist of:

  • Structure (boxplot, scatterplot, etc.)
  • Aesthetics: features such as
    • color
    • shape
    • size
      that map other characteristics to structural features

Both the structure and aesthetics should help viewers interpret the information.

Pre-Attentive Features

  • Things that “jump out” in less than 250 ms
  • Color, form, movement, spatial localization

Pre-Attentive Features

Hierarchy of Features

  • Color is stronger than shape
  • Combinations of pre-attentive features are usually not pre-attentive due to interference

Hierarchy of Features

  • Color is stronger than shape
  • Combinations of pre-attentive features are usually not pre-attentive due to interference

Your Turn

Find ways to improve the following graphic:

frame <- read.csv("https://bit.ly/2i3Q4Gf")
qplot(x, y, data = frame, shape = g1, colour = g2, size = I(4))

  • Make sure the “oddball” stands out while keeping the information on the groups
  • Hint: interaction combines factor variables

Solutions

# Make sure the "oddball" stands out while keeping the 
# information on the groups
frame$inter <- interaction(frame$g1, frame$g2)
ggplot(frame, aes(x, y)) +  
  geom_point(aes(shape = g1, color = inter), size = I(4))

Solutions

# Make sure the "oddball" stands out while keeping the 
# information on the groups
frame$inter <- interaction(frame$g1, frame$g2)
ggplot(frame, aes(x, y)) +  
  geom_point(aes(shape = g1, fill = g2, color = inter), size = I(4), stroke = I(2)) + 
  scale_shape_manual(values = c(21,23)) + 
  scale_fill_manual(values = c("red", "green")) + 
  scale_colour_manual(values = c("red", "black", "green")) + 
  guides(fill = guide_legend(override.aes = list(color = c("red", "green"))),
         colour = guide_legend(override.aes = list(fill = "white", shape = 22)))

Accuracy of Perception

  1. Position (common scale)
    e.g. bar chart, scatter plot, line graph
  2. Position (non-aligned scale)
    e.g. stacked bar chart
  3. Length, Direction, Angle, Slope
  4. Area
  5. Volume, Density, Curvature
  6. Shading, Color Saturation, Color Hue

Accuracy of Perception

Using the previous list, which is a more accurate way to display the same data:

  1. A pie chart
  2. A bar chart

Accuracy of Perception

Using the previous list, which is a more accurate way to display the same data:

  1. A pie chart
  2. A bar chart

A bar chart displays information on a common aligned scale. A pie chart requires comparisons of angles and/or area, which are less accurate.

Accuracy of Perception

If you have observations of height and weight over time, and height is more important than weight, how would you construct your plot?

aesthetic variable
x
y
color

Accuracy of Perception

If you have observations of height and weight over time, and height is more important than weight, how would you construct your plot?

aesthetic variable
x time
y height
color weight

Since height is more important to show, it should be displayed using a position scale. Weight is less important, so it should be shown using a color scale.

Aesthetics in ggplot2

Main parameters: alpha, shape, color, size

Ordering Variables

  • Position:
    higher is larger (y), items to the right are larger (x)
  • Size, Area
  • Color: not always ordered. More contrast = larger
  • Shape: Unordered.
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine

Color

  • Hue: shade of color (red, orange, yellow…)
  • Intensity: amount of color
  • Both color and hue are pre-attentive.
    Bigger contrast corresponds to faster detection.

More Color

Color is context-sensitive:

More Color

Color is context-sensitive:

A and B are the same intensity and hue, but appear to be different.

More Color

Color is context-sensitive:

A and B are the same intensity and hue, but appear to be different.

Gradients

Qualitative schemes: no more than 7 colors

Quantitative schemes: use color gradient with only one hue for positive values

More Gradients

Quantitative schemes: use color gradient with two hues for positive and negative values.
Gradient should go through a light, neutral color (white) corresponding to 0.

Small objects or thin lines need more contrast than larger areas

RColorBrewer

R package based on Cynthia Brewer’s color schemes (http://www.colorbrewer2.org)

Color in ggplot2

  • Factor variable:
    • scale_colour_discrete
    • scale_colour_brewer(palette = ...)
  • Continuous variable:
    • scale_colour_gradient (define low, high values)
    • scale_colour_gradient2 (define low, mid, and high values)
    • Equivalents for fill: scale_fill_...

Your Turn

  • In the diamonds data, cut is ordinal, while price and carat are continuous
  • Find a graphic that gives an overview of these three variables while respecting their types
  • Hint: Start with the following code
qplot(carat, price, colour = clarity, data = diamonds)

Solutions

qplot(carat, price, colour = clarity, data = diamonds) + 
  scale_colour_brewer(palette = "BuGn")

Facetting

  • A way to extract subsets of data and place them side-by-side in graphics
  • qplot Syntax: facets = row ~ col Use . if there is no variable for either row or column (i.e. facets = . ~ col)
  • ggplot Syntax: + facet_wrap(~ variable) or + facet_grid(row ~ col)

Facetting

qplot(price, carat, data = diamonds, color = color, 
      facets = . ~ clarity)

Your Turn

The movies dataset contains information from IMDB.com including ratings, genre, length in minutes, and year of release.

movies <- read.csv("https://bit.ly/2hqhCoM")
  • Explore the differences in length, rating, etc. in movie genres over time
  • Hint: use facetting!

Solutions

ggplot(movies, aes(x = year, y = budget, 
                   group = genre, color = genre)) + 
  geom_point()

Solutions

ggplot(movies, aes(x = year, y = budget, 
                   group = genre, color = genre)) + 
  geom_point(alpha = I(.2)) + 
  facet_wrap(~genre)

Solutions

ggplot(movies, aes(x = genre, fill = mpaa)) + geom_bar() 

Solutions

ggplot(movies, aes(x = year, y = length, 
                   group = genre, color = genre)) +
  geom_smooth(fullrange = F) + 
  coord_cartesian(ylim = c(0, 150))
## `geom_smooth()` using method = 'gam'

Solutions

ggplot(movies, aes(x = budget, y = rating, group = genre)) + 
  geom_point(alpha = .1) +
  facet_grid(mpaa ~ genre) + 
  geom_smooth(method = "lm", se = F) + 
  scale_x_log10()

Polishing Plots

Visual Appearance

This section focuses on the details of plots - background colors, appearance, fonts, etc.

These details allow for highly customized plots.

Plot Title

qplot(carat, price, data = diamonds) +
    ggtitle("Price vs Carat for Diamonds")

Built-In Themes

qplot(carat, price, data = diamonds)
qplot(carat, price, data = diamonds) + theme_bw()

Setting Themes

theme_set specifies a default theme for all plots:

theme_set(theme_bw())

Setting Themes

It is also possible to compare the options for each theme:

theme_bw()
## List of 57
##  $ line                 :List of 6
##   ..$ colour       : chr "black"
##   ..$ size         : num 0.5
##   ..$ linetype     : num 1
##   ..$ lineend      : chr "butt"
##   ..$ arrow        : logi FALSE
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ rect                 :List of 5
##   ..$ fill         : chr "white"
##   ..$ colour       : chr "black"
##   ..$ size         : num 0.5
##   ..$ linetype     : num 1
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ text                 :List of 11
##   ..$ family       : chr ""
##   ..$ face         : chr "plain"
##   ..$ colour       : chr "black"
##   ..$ size         : num 11
##   ..$ hjust        : num 0.5
##   ..$ vjust        : num 0.5
##   ..$ angle        : num 0
##   ..$ lineheight   : num 0.9
##   ..$ margin       :Classes 'margin', 'unit'  atomic [1:4] 0 0 0 0
##   .. .. ..- attr(*, "valid.unit")= int 8
##   .. .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : logi FALSE
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.title.x         :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       :Classes 'margin', 'unit'  atomic [1:4] 5.5 0 0 0
##   .. .. ..- attr(*, "valid.unit")= int 8
##   .. .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.title.x.top     :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 0
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       :Classes 'margin', 'unit'  atomic [1:4] 0 0 5.5 0
##   .. .. ..- attr(*, "valid.unit")= int 8
##   .. .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.title.y         :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 1
##   ..$ angle        : num 90
##   ..$ lineheight   : NULL
##   ..$ margin       :Classes 'margin', 'unit'  atomic [1:4] 0 5.5 0 0
##   .. .. ..- attr(*, "valid.unit")= int 8
##   .. .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.title.y.right   :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 0
##   ..$ angle        : num -90
##   ..$ lineheight   : NULL
##   ..$ margin       :Classes 'margin', 'unit'  atomic [1:4] 0 0 0 5.5
##   .. .. ..- attr(*, "valid.unit")= int 8
##   .. .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text            :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : chr "grey30"
##   ..$ size         :Class 'rel'  num 0.8
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.x          :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       :Classes 'margin', 'unit'  atomic [1:4] 2.2 0 0 0
##   .. .. ..- attr(*, "valid.unit")= int 8
##   .. .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.x.top      :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 0
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       :Classes 'margin', 'unit'  atomic [1:4] 0 0 2.2 0
##   .. .. ..- attr(*, "valid.unit")= int 8
##   .. .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.y          :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 1
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       :Classes 'margin', 'unit'  atomic [1:4] 0 2.2 0 0
##   .. .. ..- attr(*, "valid.unit")= int 8
##   .. .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.y.right    :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 0
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       :Classes 'margin', 'unit'  atomic [1:4] 0 0 0 2.2
##   .. .. ..- attr(*, "valid.unit")= int 8
##   .. .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.ticks           :List of 6
##   ..$ colour       : chr "grey20"
##   ..$ size         : NULL
##   ..$ linetype     : NULL
##   ..$ lineend      : NULL
##   ..$ arrow        : logi FALSE
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ axis.ticks.length    :Class 'unit'  atomic [1:1] 2.75
##   .. ..- attr(*, "valid.unit")= int 8
##   .. ..- attr(*, "unit")= chr "pt"
##  $ axis.line            : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ axis.line.x          : NULL
##  $ axis.line.y          : NULL
##  $ legend.background    :List of 5
##   ..$ fill         : NULL
##   ..$ colour       : logi NA
##   ..$ size         : NULL
##   ..$ linetype     : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ legend.margin        :Classes 'margin', 'unit'  atomic [1:4] 0.2 0.2 0.2 0.2
##   .. ..- attr(*, "valid.unit")= int 1
##   .. ..- attr(*, "unit")= chr "cm"
##  $ legend.spacing       :Class 'unit'  atomic [1:1] 0.4
##   .. ..- attr(*, "valid.unit")= int 1
##   .. ..- attr(*, "unit")= chr "cm"
##  $ legend.spacing.x     : NULL
##  $ legend.spacing.y     : NULL
##  $ legend.key           :List of 5
##   ..$ fill         : chr "white"
##   ..$ colour       : logi NA
##   ..$ size         : NULL
##   ..$ linetype     : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ legend.key.size      :Class 'unit'  atomic [1:1] 1.2
##   .. ..- attr(*, "valid.unit")= int 3
##   .. ..- attr(*, "unit")= chr "lines"
##  $ legend.key.height    : NULL
##  $ legend.key.width     : NULL
##  $ legend.text          :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         :Class 'rel'  num 0.8
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ legend.text.align    : NULL
##  $ legend.title         :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 0
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ legend.title.align   : NULL
##  $ legend.position      : chr "right"
##  $ legend.direction     : NULL
##  $ legend.justification : chr "center"
##  $ legend.box           : NULL
##  $ legend.box.margin    :Classes 'margin', 'unit'  atomic [1:4] 0 0 0 0
##   .. ..- attr(*, "valid.unit")= int 1
##   .. ..- attr(*, "unit")= chr "cm"
##  $ legend.box.background: list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ legend.box.spacing   :Class 'unit'  atomic [1:1] 0.4
##   .. ..- attr(*, "valid.unit")= int 1
##   .. ..- attr(*, "unit")= chr "cm"
##  $ panel.background     :List of 5
##   ..$ fill         : chr "white"
##   ..$ colour       : logi NA
##   ..$ size         : NULL
##   ..$ linetype     : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ panel.border         :List of 5
##   ..$ fill         : logi NA
##   ..$ colour       : chr "grey20"
##   ..$ size         : NULL
##   ..$ linetype     : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ panel.spacing        :Class 'unit'  atomic [1:1] 5.5
##   .. ..- attr(*, "valid.unit")= int 8
##   .. ..- attr(*, "unit")= chr "pt"
##  $ panel.spacing.x      : NULL
##  $ panel.spacing.y      : NULL
##  $ panel.grid.major     :List of 6
##   ..$ colour       : chr "grey92"
##   ..$ size         : NULL
##   ..$ linetype     : NULL
##   ..$ lineend      : NULL
##   ..$ arrow        : logi FALSE
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ panel.grid.minor     :List of 6
##   ..$ colour       : chr "grey92"
##   ..$ size         : num 0.25
##   ..$ linetype     : NULL
##   ..$ lineend      : NULL
##   ..$ arrow        : logi FALSE
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ panel.ontop          : logi FALSE
##  $ plot.background      :List of 5
##   ..$ fill         : NULL
##   ..$ colour       : chr "white"
##   ..$ size         : NULL
##   ..$ linetype     : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ plot.title           :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         :Class 'rel'  num 1.2
##   ..$ hjust        : num 0
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       :Classes 'margin', 'unit'  atomic [1:4] 0 0 6.6 0
##   .. .. ..- attr(*, "valid.unit")= int 8
##   .. .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ plot.subtitle        :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         :Class 'rel'  num 0.9
##   ..$ hjust        : num 0
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       :Classes 'margin', 'unit'  atomic [1:4] 0 0 4.95 0
##   .. .. ..- attr(*, "valid.unit")= int 8
##   .. .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ plot.caption         :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         :Class 'rel'  num 0.9
##   ..$ hjust        : num 1
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       :Classes 'margin', 'unit'  atomic [1:4] 4.95 0 0 0
##   .. .. ..- attr(*, "valid.unit")= int 8
##   .. .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ plot.margin          :Classes 'margin', 'unit'  atomic [1:4] 5.5 5.5 5.5 5.5
##   .. ..- attr(*, "valid.unit")= int 8
##   .. ..- attr(*, "unit")= chr "pt"
##  $ strip.background     :List of 5
##   ..$ fill         : chr "grey85"
##   ..$ colour       : chr "grey20"
##   ..$ size         : NULL
##   ..$ linetype     : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ strip.placement      : chr "inside"
##  $ strip.text           :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : chr "grey10"
##   ..$ size         :Class 'rel'  num 0.8
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ strip.text.x         :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       :Classes 'margin', 'unit'  atomic [1:4] 5.5 0 5.5 0
##   .. .. ..- attr(*, "valid.unit")= int 8
##   .. .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ strip.text.y         :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : num -90
##   ..$ lineheight   : NULL
##   ..$ margin       :Classes 'margin', 'unit'  atomic [1:4] 0 5.5 0 5.5
##   .. .. ..- attr(*, "valid.unit")= int 8
##   .. .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ strip.switch.pad.grid:Class 'unit'  atomic [1:1] 0.1
##   .. ..- attr(*, "valid.unit")= int 1
##   .. ..- attr(*, "unit")= chr "cm"
##  $ strip.switch.pad.wrap:Class 'unit'  atomic [1:1] 0.1
##   .. ..- attr(*, "valid.unit")= int 1
##   .. ..- attr(*, "unit")= chr "cm"
##  - attr(*, "class")= chr [1:2] "theme" "gg"
##  - attr(*, "complete")= logi TRUE
##  - attr(*, "validate")= logi TRUE

Elements

Create a theme, or modify an existing one.

Themes are made up of elements which can be one of:

  • element_line
  • element_text
  • element_rect
  • element_blank

This provides a lot of control over plot appearance.

Modifying Elements

  • Axis: axis.line, axis.text.x, axis.text.y, axis.ticks, axis.title.x, axis.title.y
  • Legend: legend.background, legend.key, legend.text, legend.title
  • Panel: panel.background, panel.border, panel.grid.major, panel.grid.minor
  • Strip: strip.background, strip.text.x, strip.text.y

Modifying a plot

p <- qplot(carat, price, data = diamonds) + 
    ggtitle("Price vs Carat for Diamonds")
p + theme(plot.title = element_text(colour = "red", angle = 20))

Use this power wisely

Removing Axes

It’s also possible to remove all axes (helpful for maps):

p + theme(
    axis.text.x = element_blank(),
    axis.text.y = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    axis.ticks.length = unit(0, "cm")
)

Saving your Work

The ggsave() function will save the last plot produced:

qplot(total_bill, tip, data = tips)

ggsave("tips.png")
ggsave("tips.pdf")
ggsave("tips.png", width = 6, height = 6)

Saving your Work

Or explicitly tell it which plot to save:

dplot <- qplot(total_bill, tip, data = tips)
ggsave("tips.png", plot = dplot, dpi = 72)

Your Turn

  1. Save a pdf of a scatterplot of price vs carat
  2. Open up the pdf in Adobe Acrobat (or another PDF Reader)
  3. Save a png of the same scatterplot

Solutions

qplot(price, carat, data = diamonds)

ggsave("diamonds.pdf")
## Saving 7 x 4 in image

ggsave("diamonds.png")
## Saving 7 x 4 in image

diamonds.pdf

diamonds.png